Lecture 3 : KL - divergence and connections

نویسنده

  • David Witmer
چکیده

1 Recap Recall some important facts about entropy and mutual information from the previous lecture: • H(X,Y ) = H(X) + H(Y |X) = H(Y ) + H(X|Y ) • I(X;Y ) = H(X)−H(X|Y ) = H(Y )−H(Y |X) = H(X) + H(Y )−H(X,Y ) • I(X;Y |Z) = H(X|Z)−H(X|Y,Z) • I(X;Y ) = 0 if X and Y are independent • I(X;Y ) ≥ 0 or, equivalently, H(X) ≥ H(X|Y ) Exercise 1.1 Prove that H(X|Y ) = 0 if and only if X = g(Y ) for some function g. 2 More mutual information 2.1 Mutual information chain rule We begin by proving the chain rule for mutual information. Theorem 2.1 (Chain rule for mutual information)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Data Analysis

These lecture notes are based on [BNS+16] and were compiled for a guest lecture in the course CS229r “Information Theory in Computer Science” taught by Madhu Sudan at Harvard University in Spring 2016. Menu for today’s lecture: • Motivation • Model • Overfitting & comparison to non-adaptive data analysis • What can we do adaptively? • KL Divergence recap • Proof • Differential privacy (time per...

متن کامل

3 Proof of Theorem 1 using the Primal - Dual method

In this section we show how the refined upper bound on the regret of the EXP algorithm proved using the potential function approach (KL divergence) also gives us a better bound for the expert game setup with bandit feedback. Last lecture we showed how in the case of expert prediction with bandit feedback using the Exp3 algorithm, the regret is upper bounded by T 2/3n1/3 using a rough upper boun...

متن کامل

Cs 674/info 630: Advanced Language Technologies

P~ θ : V 7→ [0, 1], where ~ θ is an element of the m-dimensional probability simplex. Hence the probability assigned to a single term vj is defined as: P~ θ (vj) def = θ[j]. Also recall from the previous lecture that the Kullback–Leibler (KL) divergence between two probability distributions P~ θ and P~ θ′ , i.e. the expected log-likelihood ratio with respect to P~ θ, is defined as: D(P~ θ ‖P~ θ...

متن کامل

Notes on Kullback-Leibler Divergence and Likelihood

The Kullback-Leibler (KL) divergence is a fundamental equation of information theory that quantifies the proximity of two probability distributions. Although difficult to understand by examining the equation, an intuition and understanding of the KL divergence arises from its intimate relationship with likelihood theory. We discuss how KL divergence arises from likelihood theory in an attempt t...

متن کامل

Kullback-Leibler Divergence for Nonnegative Matrix Factorization

The I-divergence or unnormalized generalization of KullbackLeibler (KL) divergence is commonly used in Nonnegative Matrix Factorization (NMF). This divergence has the drawback that its gradients with respect to the factorizing matrices depend heavily on the scales of the matrices, and learning the scales in gradient-descent optimization may require many iterations. This is often handled by expl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013